RNA-Seq Data Analysis    ◾    179

Yg

g

g

µ

αµ

(

)=

+

var

(

)

2

(5.19)

In the quasi-negative binomial distribution, the variance is modeled as follows:

Yg

g

g

g

σ

µ

θµ

(

)

(

)=

+

var

2

2

(5.20)

The RNA-Seq study design may include a single or several conditions called factors. A

researcher usually may be interested in testing the effect of a condition. For instance,

assume that a researcher wants to study breast cancer in women. She conducted an RNA-

Seq study on samples from healthy and cancer tissues of five affected women. The analysis

programs require a matrix that describes the design called a design matrix. The design

matrix defines the model (structure of the relationship between genes and explanatory

variables), and it is also used to store values of the explanatory variable [32]. The design

matrix will be created from the study metadata as shown in Table 5.1.

The design matrix will include dummy variables setting the level of each factor to either

zero or one as we will see soon.

The generalized linear model will fit the data of this study design so that the expression

of each gene will be described as a linear combination of the dummy explanatory variables.

y

β

β

β

ε

=

+

+

+

 

 

*Patient

*Condition

0

1

2

(5.21)

where y is the response variable that represents the gene expression in a specific unit, β0

is the intercept or the average gene expression when the other parameters are zero, and β1

and β2 are the generalized linear regression parameters that represent the effect of each

explanatory variable. A log-linear model is used as

X

N

gi

i

T

g

i

µ

β

=

+

log

log

(5.22)

where Xi

T is a vector of covariates (explanatory variables) that specifies the conditions/­

factors applied to sample i and

g

β is a vector of regression coefficients for the gene g.

TABLE 5.1  Sample Information or

Metadata for the Design Matrix

SampleID

Condition

Patient

norm_rep1

Norm

Rep1

norm_rep2

Norm

Rep2

norm_rep3

Norm

Rep3

tumo_rep1

Tumo

Rep1

tumo_rep2

Tumo

Rep2

tumo_rep3

Tumo

Rep3